智能论文笔记

Multi-stage Information Retrieval for Vietnamese Legal Texts

Nhat-Minh Pham , Ha-Thanh Nguyen , Trong-Hop Do

分类：自然语言处理

2022-09-29

这项研究涉及越南法律文本的信息检索问题（IR）。尽管对多种语言进行了很好的研究，但信息检索仍未受到越南研究界的关注。对于很难处理的法律文件，尤其如此。这项研究提出了一种使用句子转换器的越南法律文件的信息检索的新方法。此外，还进行了各种实验，以对不同的变压器模型，排名得分，音节级别和单词级训练进行比较。实验结果表明，所提出的模型优于当前有关越南文档信息检索的研究模型。

translated by 谷歌翻译

MANDO: Multi-Level Heterogeneous Graph Embeddings for Fine-Grained Detection of Smart Contract Vulnerabilities

Hoang H. Nguyen , Nhat-Minh Nguyen , Chunyao Xie , Zahra Ahmadi , Daniel Kudendo , Thanh-Nam Doan , Lingxiao Jiang

分类：机器学习

2022-08-28

由不同类型的节点和边缘组成的学习异质图增强了均匀图技术的结果。这样的图形的一个有趣示例是代表可能的软件代码执行流的控制流图。由于此类图代表了代码的更多语义信息，因此为这些图形开发技术和工具可能对检测软件中的漏洞的可靠性非常有益。但是，现有的异质图技术仍然不足以处理复杂的图形，在处理复杂的图形中，不同类型的节点和边缘数量较大且可变。本文集中于以太坊智能合约作为由构建在控制流图和包含不同类型的节点和链接的呼叫图的异质合同图表示的软件代码样本。我们提出了曼多（Mando），这是一种新的异质图表示，以学习这种异质合同图的结构。 Mando提取自定义的Metapaths，该Metapaths在不同类型的节点及其邻居之间建立了关系连接。此外，它开发了一个多米达异构图注意网络，以学习不同类型的节点及其在异质合同图中的多层嵌入，可以更准确地捕获智能合约的代码语义，并便利两者。 - 水平和粗粒合同级别的漏洞检测。我们对大型智能合同数据集的广泛评估表明，曼多（Mando）在粗粒合同水平上改善了其他技术的脆弱性检测结果。更重要的是，它是第一种基于学习的方法，能够在细粒度的线条层面上识别漏洞，并在F1分数方面将基于代码分析的传统漏洞检测方法显着提高了11.35％至70.81％。

translated by 谷歌翻译

Machine Learning Approach to Polymerization Reaction Engineering: Determining Monomers Reactivity Ratios

Tung Nguyen , Mona Bavarian

分类：机器学习

2023-01-03

Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.

translated by 谷歌翻译

Neural Collapse in Deep Linear Network: From Balanced to Imbalanced Data

Hien Dang , Tan Nguyen , Tho Tran , Hung Tran , Nhat Ho

分类：机器学习 | (统计)机器学习

2023-01-01

Modern deep neural networks have achieved superhuman performance in tasks from image classification to game play. Surprisingly, these various complex systems with massive amounts of parameters exhibit the same remarkable structural properties in their last-layer features and classifiers across canonical datasets. This phenomenon is known as "Neural Collapse," and it was discovered empirically by Papyan et al. \cite{Papyan20}. Recent papers have theoretically shown the global solutions to the training network problem under a simplified "unconstrained feature model" exhibiting this phenomenon. We take a step further and prove the Neural Collapse occurrence for deep linear network for the popular mean squared error (MSE) and cross entropy (CE) loss. Furthermore, we extend our research to imbalanced data for MSE loss and present the first geometric analysis for Neural Collapse under this setting.

translated by 谷歌翻译

Integrating Semantic Information into Sketchy Reading Module of Retro-Reader for Vietnamese Machine Reading Comprehension

Hang Thi-Thu Le , Viet-Duc Ho , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

Machine Reading Comprehension has become one of the most advanced and popular research topics in the fields of Natural Language Processing in recent years. The classification of answerability questions is a relatively significant sub-task in machine reading comprehension; however, there haven't been many studies. Retro-Reader is one of the studies that has solved this problem effectively. However, the encoders of most traditional machine reading comprehension models in general and Retro-Reader, in particular, have not been able to exploit the contextual semantic information of the context completely. Inspired by SemBERT, we use semantic role labels from the SRL task to add semantics to pre-trained language models such as mBERT, XLM-R, PhoBERT. This experiment was conducted to compare the influence of semantics on the classification of answerability for the Vietnamese machine reading comprehension. Additionally, we hope this experiment will enhance the encoder for the Retro-Reader model's Sketchy Reading Module. The improved Retro-Reader model's encoder with semantics was first applied to the Vietnamese Machine Reading Comprehension task and obtained positive results.

translated by 谷歌翻译

Leveraging Semantic Representations Combined with Contextual Word Representations for Recognizing Textual Entailment in Vietnamese

Quoc-Loc Duong , Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

RTE is a significant problem and is a reasonably active research community. The proposed research works on the approach to this problem are pretty diverse with many different directions. For Vietnamese, the RTE problem is moderately new, but this problem plays a vital role in natural language understanding systems. Currently, methods to solve this problem based on contextual word representation learning models have given outstanding results. However, Vietnamese is a semantically rich language. Therefore, in this paper, we want to present an experiment combining semantic word representation through the SRL task with context representation of BERT relative models for the RTE problem. The experimental results give conclusions about the influence and role of semantic representation on Vietnamese in understanding natural language. The experimental results show that the semantic-aware contextual representation model has about 1% higher performance than the model that does not incorporate semantic representation. In addition, the effects on the data domain in Vietnamese are also higher than those in English. This result also shows the positive influence of SRL on RTE problem in Vietnamese.

translated by 谷歌翻译

Is word segmentation necessary for Vietnamese sentiment classification?

Duc-Vu Nguyen , Ngan Luu-Thuy Nguyen

分类：自然语言处理

2023-01-01

To the best of our knowledge, this paper made the first attempt to answer whether word segmentation is necessary for Vietnamese sentiment classification. To do this, we presented five pre-trained monolingual S4- based language models for Vietnamese, including one model without word segmentation, and four models using RDRsegmenter, uitnlp, pyvi, or underthesea toolkits in the pre-processing data phase. According to comprehensive experimental results on two corpora, including the VLSP2016-SA corpus of technical article reviews from the news and social media and the UIT-VSFC corpus of the educational survey, we have two suggestions. Firstly, using traditional classifiers like Naive Bayes or Support Vector Machines, word segmentation maybe not be necessary for the Vietnamese sentiment classification corpus, which comes from the social domain. Secondly, word segmentation is necessary for Vietnamese sentiment classification when word segmentation is used before using the BPE method and feeding into the deep learning model. In this way, the RDRsegmenter is the stable toolkit for word segmentation among the uitnlp, pyvi, and underthesea toolkits.

translated by 谷歌翻译

DRG-Net: Interactive Joint Learning of Multi-lesion Segmentation and Classification for Diabetic Retinopathy Grading

Hasan Md Tusfiqur , Duy M. H. Nguyen , Mai T. N. Truong , Triet A. Nguyen , Binh T. Nguyen , Michael Barz , Hans-Juergen Profitlich , Ngoc T. T. Than , Ngan Le , Pengtao Xie

分类：计算机视觉

2022-12-30

Diabetic Retinopathy (DR) is a leading cause of vision loss in the world, and early DR detection is necessary to prevent vision loss and support an appropriate treatment. In this work, we leverage interactive machine learning and introduce a joint learning framework, termed DRG-Net, to effectively learn both disease grading and multi-lesion segmentation. Our DRG-Net consists of two modules: (i) DRG-AI-System to classify DR Grading, localize lesion areas, and provide visual explanations; (ii) DRG-Expert-Interaction to receive feedback from user-expert and improve the DRG-AI-System. To deal with sparse data, we utilize transfer learning mechanisms to extract invariant feature representations by using Wasserstein distance and adversarial learning-based entropy minimization. Besides, we propose a novel attention strategy at both low- and high-level features to automatically select the most significant lesion information and provide explainable properties. In terms of human interaction, we further develop DRG-Net as a tool that enables expert users to correct the system's predictions, which may then be used to update the system as a whole. Moreover, thanks to the attention mechanism and loss functions constraint between lesion features and classification features, our approach can be robust given a certain level of noise in the feedback of users. We have benchmarked DRG-Net on the two largest DR datasets, i.e., IDRID and FGADR, and compared it to various state-of-the-art deep learning networks. In addition to outperforming other SOTA approaches, DRG-Net is effectively updated using user feedback, even in a weakly-supervised manner.

translated by 谷歌翻译

Multimodal Wildland Fire Smoke Detection

Siddhant Baldota , Shreyas Anantha Ramaprasad , Jaspreet Kaur Bhamra , Shane Luna , Ravi Ramachandra , Eugene Zen , Harrison Kim , Daniel Crawl , Ismael Perez , Ilkay Altintas

分类：计算机视觉

2022-12-29

Research has shown that climate change creates warmer temperatures and drier conditions, leading to longer wildfire seasons and increased wildfire risks in the United States. These factors have in turn led to increases in the frequency, extent, and severity of wildfires in recent years. Given the danger posed by wildland fires to people, property, wildlife, and the environment, there is an urgency to provide tools for effective wildfire management. Early detection of wildfires is essential to minimizing potentially catastrophic destruction. In this paper, we present our work on integrating multiple data sources in SmokeyNet, a deep learning model using spatio-temporal information to detect smoke from wildland fires. Camera image data is integrated with weather sensor measurements and processed by SmokeyNet to create a multimodal wildland fire smoke detection system. We present our results comparing performance in terms of both accuracy and time-to-detection for multimodal data vs. a single data source. With a time-to-detection of only a few minutes, SmokeyNet can serve as an automated early notification system, providing a useful tool in the fight against destructive wildfires.

translated by 谷歌翻译

Multisensor Data Fusion for Reliable Obstacle Avoidance

Thanh Nguyen Canh , Truong Son Nguyen , Cong Hoang Quach , Xiem HoangVan , Manh Duong Phung

分类：机器人

2022-12-26

In this work, we propose a new approach that combines data from multiple sensors for reliable obstacle avoidance. The sensors include two depth cameras and a LiDAR arranged so that they can capture the whole 3D area in front of the robot and a 2D slide around it. To fuse the data from these sensors, we first use an external camera as a reference to combine data from two depth cameras. A projection technique is then introduced to convert the 3D point cloud data of the cameras to its 2D correspondence. An obstacle avoidance algorithm is then developed based on the dynamic window approach. A number of experiments have been conducted to evaluate our proposed approach. The results show that the robot can effectively avoid static and dynamic obstacles of different shapes and sizes in different environments.

translated by 谷歌翻译